Data processing apparatus that uses compression or data stored in memory

ABSTRACT

Data, such as an image, is made up of data-items (pixels) that are each associated with a respective data address. Compressed blocks representing the data are stored in a memory system. Each block representing compressed data-items associated with data addresses in a respective sub-range of addresses of the data. Each block starts from a respective preferred starting address for multi address transfer. The sub-range of addresses of each block has a length corresponding to an address distance between the preferred starting address, leaving memory addresses not occupied by the particular block in between blocks due to compression. A decompressor is coupled between a processing element and the memory system. The decompressor starts a multi address memory transfer of a required one of the blocks from the memory system dynamically when the processing element requires access to the block, leaving memory addresses directly following the block up to a preferred starting address for a next one of the blocks untransferred in the transfer. The transferred data is decompressed and passed to the processor.

The invention relates to a data processing apparatus that uses datacompression for data stored in memory.

From U.S. Pat. No. 6,173,381 a data processing system is known with aprocessor and a system memory that are connected via a bus. Data, suchas image data, may be stored in compressed or uncompressed form in thesystem memory. The processor is connected to the system memory via anintegrated memory controller that compresses and decompresses thecompressed data when it is written to and read from the system memory.U.S. Pat. No. 6,173,381 teaches how compression is used to reduce memoryoccupation and bus bandwidth, because storage of data in compressed formtakes less memory locations than needed for the same data inuncompressed form.

Storing data in compressed form can interfere with processing of thedata, when that processing requires addressing of different locationswithin the data. Because of compression, and especially variable lengthcompression, the address distances between different items in theuncompressed data are not preserved in the compressed data. U.S. Pat.No. 6,173,381 solves this problem by using a cache memory between theprocessor and the integrated memory controller, to store decompresseddata in cache. Thus, the decompressed data can be addressed by theprocessor in the cache memory using virtual addresses of decompresseddata. The integrated memory controller has to ensure that the compresseddata is read and written at the appropriate system memory addressesduring cache fetch or write back. U.S. Pat. No. 6,173,381 does notdescribe how the compressed data is appropriately addressed, butpresumably the virtual address of decompressed data issued by theprocessor is translated into a physical address of the compressed formof the data, and the data is written to or read from these physicaladdresses. Translation of virtual addresses into physical addresses mayslow down processing.

In many modern data processing systems data is retrieved in bustransfers where a block with a large number of addressable words, forexample up to 64 or 128 bytes, can be transferred between memory and aprocessor in response to each single address. Such transfers must startfrom specific starting addresses (called preferred starting addresseshereinafter), for example addresses at 128 byte block boundaries (ofwhich addresses a number of least significant bits is zero) typically atequal distance from one another, or at least additional overhead isneeded if the transfer has to start from an address that is not apreferred starting address. The length of the transfer can be selected.This provides for an increase of memory bandwidth. In known processors,this number of words is not related to compression parameters.

Among others, it is an object of the invention to provide for a dataprocessing apparatus and method in which the bus bandwidth needed foraccessing data is reduced by compression without complicating access todifferent addressable parts of the data.

Among others, it is an object of the invention to provide for a dataprocessing apparatus and method in which the bus bandwidth needed foraccessing image and/or audio data is reduced by compression withoutcomplicating access to different addressable parts of the data.

Among others, it is an object of the invention to provide for a dataprocessing apparatus and method in which the bus bandwidth used forprocesses that use decompressed data can be adapted dynamically.

The data processing apparatus according to the invention is set forth inClaim 1. The apparatus processes data-items that are each associatedwith a respective data address in a range of data addresses, such aspixels in an image with associated x,y addresses or temporal dataassociated with sampling instants t_(n). Compressed blocks are used thateach represent data-items from a respective sub-range of the range ofdata addresses. The lengths of the sub-ranges are selected so that theycorrespond to the distance between pairs of preferred starting memoryaddresses for multi-address memory transfers. Preferably, each sub-rangehas the same length. The compressed blocks are stored in the memorysystem, each starting from a preferred starting memory address, so thatthe address distance to the starting memory address of the next blockcorresponds to the length of the sub-range of data addresses associatedto the data-items in the block.

Thus, it is made possible to reduce the memory access bandwidth forstoring and retrieving the blocks, by using multi-address memorytransfers that are terminated when a block has been transferred. Becausethe distance between the starting addresses of the blocks is the same asfor the uncompressed data, the starting addresses of the transfers canbe determined directly from the data addresses of the requireduncompressed data items, for example by taking a more significant partof the data address. As a result the range of memory addresses overwhich the compressed blocks are stored is substantially the same asrequired for the uncompressed data-items. Thus no reduction of theaddress range of occupied memory is realized, but only a reduction inbandwidth usage.

A processing element applies processing operations, such as filtering tothese data-items. Typically, the processing element addresses thedata-items with the data addresses (possibly modified with some offset),but it is also possible that the processor uses the data addresses onlyimplicitly, for example by calling for data-items that have adjacentdata addresses merely by indicating that a next data-item is needed.Preferably, decompressed data for all data addresses within thedecompressed block is stored in a buffer for such retrieval, butalternatively it is possible to decompress each time only the addresseddata within the block. The memory system is for example a singlesemi-conductor memory with attached memory bus, or any combination ofmemories that cooperate to supply data in response to addresses.

When the blocks of compressed data are retrieved for decompression thelength of multi-address memory transfers is selected dependent on theactual block sizes. During memory transfers transfer is terminated whendata from the block of compressed data has been transferred, before thedata up until the start of the next block has been transferred. Thus,blocks of compressed data can be retrieved with minimum bus bandwidthand be addressed without requiring knowledge of the size of other blocksof compressed data.

The length of the sub-range of addresses of which the data is compressedtogether into a compressed block preferably is equal to the distancebetween a pair of successive preferred starting memory addresses. Thisenables more efficient memory bus utilisation and potentially reducesmemory access latency. However, without deviating from the invention asub-range may extend over a plurality of distances between successivepreferred starting memory addresses. This provides for highercompression ratios and therefore less memory bandwidth. In this case aplurality of multi-address memory transfers may be used to transfer oneblock.

Information about the lengths of the blocks of compressed data ispreferably stored with the blocks. Thus, these lengths automaticallybecome available when the blocks are transferred, without requiringfurther memory addressing. In one embodiment length information for ablock of compressed data is stored with the block itself. Thus, a signalcan be generated to end the transfer on the basis of information in theblock itself. In another embodiment length information for a logicallynext block of compressed data is stored with a block of compressed data.(by a logically next block is meant a block that is accessed next by theprocessing element, e.g. blocks are logically next to each other whenthey encode image data for adjacent image regions). Thus, lengthinformation becomes available for setting the transfer length for ablock before the block is addressed. This is useful when the transferlength must be set at the start of each transfer.

Preferably a scaleable decompressing technique is used, in which thequality of decompression can be adapted by using a greater or smallerlength of the block. Thus, bandwidth use can be adapted dynamically atthe expense of decompression quality by adapting the length of thetransfer of data from a block.

Preferably lossy compression is used, in particular when the data isintended for rendering for human perception (e.g. image data or audiodata). After lossy compression the data generally cannot bereconstructed exactly by decompression, but it delivers the sameperceived content to a greater or lesser extent, dependent on thecompression ratio. In an embodiment, the compression ratio is adapteddynamically, dependent on the dynamically available memory bandwidth.

In another embodiment different decompression options are available,that reconstruct the data with increasingly less accuracy, usingdifferent increasingly less data, so that by terminating memorytransfers sooner and less bandwidth may be used at the expense of lessaccuracy.

These and other objects and advantageous aspects of the invention willbe described using the following figures.

FIG. 1 shows a data processing apparatus

FIG. 2 illustrates memory access

FIG. 3 shows memory occupation

FIG. 4 shows a processing element

FIG. 5 shows memory occupation

FIG. 1 shows a data processing apparatus with a memory 10, and a numberof processing elements 14 (only two shown by way of example)interconnected via a bus 12. The processing elements 14 contain aprocessor 140, a decompressor 142 and a compressor 144. Processor 140 iscoupled to bus 12 via decompressor 142 and compressor 144. In thecontext of the present application memory 10 and bus 12 said to be partof a memory system that provides access to data in memory 10.

FIG. 2 illustrates a memory transfer involving memory 10 via bus 12during operation of the apparatus of FIG. 1. By way of example, FIG. 2illustrates a separate address signal 20, a data signal 22 and an endsignal 24. In order to read or write data from or to memory 10,processing element 14 first outputs a block address 21 in address signal20. Subsequently a number of words of data 23 is transferred for theblock address 21. In case of a read operation words of data 23 are datawords from successive memory locations with addresses starting from theblock address 21. In case of a write operation words of data 23 are datawords from processing element 14 that have to be written in successivememory locations with addresses starting from the block address 21.

After transfer of a number of words of data 23 processing element 14generates an end signal 25 indicating the termination of the memorytransfer for the block address 21, and availability of bus 12 for a nextmemory transfer at a next block address 27. Thus, data words 23 aretransmitted during a time-slot 26, the length of which is controlled byprocessing element 14. (It will be appreciated that in the actualimplementation types of signals may be used that differ from addresssignal 20, data signal 22 and/or end signal 24, but represent the sameinformation. For example, the end signal may be represented by a lengthcode transmitted at the start of the transfer).

FIG. 3 shows actual memory occupation 30 in memory 10 and virtual memoryoccupation 32 as seen by processors 140. Memory 10 is shown organizedinto blocks 300 a-d, the blocks 300 a-d being shown one above the other.The length of the blocks corresponds to the number of words betweensuccessive locations that can be addressed by different block addresses21. Typically, the length is a power of 2, for example 64 words or 128words per block.

In one embodiment, a memory 10 (known per se) is used, which isconstructed so that multi-address memory transfers can start only fromblock boundary addresses, e.g. from addresses that are 128 bytes or 256bytes apart, of which the last 7 or 8 bits of the address are zero. Inresponse to a request for a multi-address memory transfer, the memoryinternally generates signals that effect the equivalent of successivelyaddressing locations in the memory whose addresses have different valuesof the less significant bits of the address. The architecture of suchmemory systems is designed to deliver optimal performance (in terms ofbus utilization and latency) for this type of accesses from the start ofa line. This applies both to reading and writing. The starting addressesin this embodiment will be referred to by the term “preferred startingaddresses”, although in fact they are in fact the only possible startingaddresses for multi-address memory transfers.

In another embodiment a memory (known per se) is used which isconstructed so that the least significant part of the starting addressof a multi-address memory transfer may optionally be used to select thestarting address of the multi-address memory transfer, at the expense ofat least an additional memory clock cycle. In this case, a signal issent to memory 10 not to use this additional clock cycle, but to startthe multi-address memory transfer immediately from a standard startingaddress with minimum overhead, without using one or more additionalclock cycles for an adapted starting address. The term “preferredstarting address” will be used to refer to these standard addresses inthis embodiment. Of course, both embodiments may have furtherembodiments in which a maximum transfer length may be imposed by thedistance between successive preferred starting addresses, so that a newmulti address transfer has to be started for each preferred startingaddress if a block to be transferred extends over more than one startingaddress, but the invention is not limited to such further embodiments.

Preferably, the compression block size is selected so that the addressdistance between successive blocks of uncompressed data is equal to thedistance between a pair of preferred starting addresses for amulti-address memory transfer. In many compression algorithms the blocksize can be adjusted, or compression blocks can be combined into largerblocks so that the required block size, as defined by the memoryarchitecture can be realized. As discussed in the following, compressionblock size may alternatively be set to an integer multiple of thismemory system block size. When the compressed data from the blocks isdecompressed each block of decompressed data has a length correspondingto the distance between a pair of preferred starting addresses in memory10. Preferably all blocks of decompressed data have the same length.

Those memory locations in actual memory occupation 30 that are occupiedby compressed data are shown as hatched areas. As shown in actual memoryoccupation 30, varying parts of memory transfer units 300 a-d are leftunoccupied by compressed data, when variable length compression is used.

A processing element 14 contains a decompressor 142 and a compressor144. Decompressor 142 retrieves compressed data from memory 10 via bus12 by supplying a block address 21 of a block of compressed data andgenerating an end signal 25 to terminate the memory transfer when allthe compressed data from the addressed block has been transferred, butbefore the content of the entire physical memory transfer unit has beentransferred. Decompressor 142 decompresses the retrieved data from theaddressed block and supplies the decompressed data to processor 140.

Similarly, compressor 144 compresses data produced by processor 140 andwrites the compressed data to memory 10 via bus 12. In this casecompressor 144 supplies a single block address 21 for a block ofcompressed data, transmits the words of compressed data from thecompressed block and sends a signal to terminate transfer for the blockaddress 21 when the number of words that represents the compressed datahas been transmitted, before all words in the physical memory transferunit have been overwritten.

Processor 140 addresses data in the blocks in terms of addresses ofdecompressed data. That is, the data address is generally composed of ablock address of a decompressed block and a word address within thedecompressed block. The word address can assume any value up to thepredetermined decompressed block size. Thus, to processor 140, theaddress space appears as shown in virtual memory occupation 32, whereineach block 320 a-d occupies the same predetermined number of locations.When processor 140 issues a read request it supplies the data address todecompressor 142. Unless the addressed data has been cached,decompressor 142 uses the block address part of the data address toaddress memory 10 via bus 12. Subsequently, decompressor 142 retrievesfrom the addressed block the actual number of words that is needed torepresent the compressed block the memory transfer being terminated oncethis actual number has been transferred, but generally before the fullpredetermined length of the block has been transferred. Decompressor 142decompresses the retrieved data, selects the data addressed by the dataaddress from processor 140 and returns the selected data to processor140.

Preferably, decompressor 142 contains a buffer memory (not shownseparately) for storing data for all data addresses of the decompressedblock. When the block is decompressed decompressed data is written toall these locations and the data addressed by processor 140 is providedto processor 140 from these locations. Alternatively, each time only theaddressed word from the data may be decompressed or a subset of thewords including the addressed word. Generally it will require littleadditional effort to decompress all words of a block instead of just oneword, by buffering all words access latency is decreased on average.However, it should be understood that in an embodiment the compressedblock may be made up of sub-blocks that can be decompressedindependently of one another. In this case the decompressed data for onesub-block may overwrite the data of another sub-block from the sameblock in the buffer memory, when data from the one sub-block is needed,without fetching of new a block from memory system 10.

When processor 140 writes data, processor 140 supplies for the writedata a data address that is used by compressor 144. Typically,compressor 144 stores data from a complete uncompressed block, uses thewrite data to replace this uncompressed data at the address that isaddressed by the data address, later compresses the data and writes thecompressed data to memory 10 using the block address from the dataaddress used by processor 140. Compressor 144 terminates the transferwhen the compressed data for the block address has been transferred,generally before the predetermined number of words has been transferredto memory 10 that corresponds to the distance between successive blockaddresses.

As a result, when processor 140 addresses substantially the entiredecompressed data the number of words that has to be transferred via bus12 between processing element 14 and memory 10 is smaller than the totalnumber of words in the decompressed data, leaving more bus and memorybandwidth for other transfers. The memory space occupied by compresseddata is generally not reduced by using compressed data, since unoccupiedspace is left behind each compressed block in memory 10, to permit usedblock addresses of decompressed blocks to be used as block addresses forretrieving compressed blocks.

In one example, a compressed video image is stored distributed over aplurality of successive compressed blocks in memory. Afterdecompression, processor 140 addresses pixels of this imageindividually. In this case the distance between the lowest and highestaddress of the memory locations occupied by the compressed image issubstantially the same as that needed for storing the uncompressedimage, again because the unused memory locations are left at the end ofeach compressed block 300 a-d. In this case, a video display device,such as a television monitor may be coupled to memory 10 via adecompressor and bus 12, or a video source, such as a camera or a cableinput may be coupled to memory 10 via a compressor and bus 12.

Compressor 144 and decompressor 142 preferably make use of variablelength compression, which adapts the length of the compressed data ineach compressed block to the particular uncompressed data in the block.This makes it possible to minimize memory and bus bandwidth use.

In case of image data or other sensory data such as audio data lossycompression may be used, which compresses the data at the expense ofsome information loss. This also makes it possible to minimize memoryand bus bandwidth use. In an embodiment the compression ratio (andthereby the amount of loss) is dynamically adapted to the dynamicallyavailable bus bandwidth. In this embodiment a bus monitor device (notshown) may be coupled to bus 12 to determine the bandwidth use. This canbe realized for example when processing elements 14 are designed to sendsignals to the bus monitor to indicate a requested bandwidth use, orwhen the bus monitor counts the number of unused bus cycles per timeunit The bus monitor is coupled to compressor 144 to set the compressionratio in compressor 144, either dynamically, or in response to a requestfrom a processing element 14 to start writing compressed data.

Preferably, compressor 144 includes a length code in each block ofcompressed data, to indicate the number of words in the block ofcompressed data. The length code is included for example in a first wordof the compressed block, preceding the compressed data. Thus the formatof a block is

-   -   (length code of block, compressed data)        When decompressor 142 uses a block address to retrieve a        compressed block, decompressor 142 reads the length code from        the compressed block and uses the length code to signal to        memory 10 after how many words the memory transfer for the block        address may be terminated.

As an alternative, compressor 144 may be arranged to store the lengthcode for each particular compressed block in a preceding and/orsucceeding compressed block adjacent to the particular compressed blockin memory 10.

-   -   (length code of preceding and/or succeeding block, compressed        data)        In this case, decompressor 142 has to read the preceding or        succeeding block first to determine the number of words that has        to be included in the memory transfer. Because blocks are mostly        transferred in the order in which they are stored in memory,        decompressor 142 may usually avoid additional memory transfers        to retrieve the length code by retaining the length code from a        compressed block to control the length of the memory transfer        for a next fetched compressed block. This makes it possible to        supply the length code at the start of the memory transfer.        Usually, data is accessed only in one address direction. In this        case, it suffices to store in each particular compressed block        the length code for the adjacent block in this one direction. In        another embodiment, length codes for adjacent blocks in both        directions are included to avoid separate reading of the length        codes when reading in either direction. When this process of        successive transfers is started, the length of the first block        is unknown. In such cases, the whole uncompressed length may be        transferred which yields a small penalty for the first transfer        only.

In yet another embodiment the particular compressed block for which thelength code is included with a specific compressed block in memory 10may be adapted to the expected way of addressing blocks successively:for example if it is expected that each second decompressed block willbe skipped, the length codes of the second next compressed block isincluded with each a block. In a further embodiment a next block code isincluded with the block to indicate the logically following block forwhich block the length code is included. The block format is now forexample ( code identifying logically following block, length code oflogically following block, compressed data for current block )

In an embodiment where compressed image data is stored for example, itmay be desirable to skip every second image line when an interlacedimage is accessed. Accordingly the length code at the end of each imageline may be arranged to describe the number of compressed words for thestart of the second next image line.

FIG. 4 shows an embodiment of a processing element with a cache memory40 and a cache management unit 42. Cache memory 40 is coupled betweenprocessor 140 on one hand and compressor 144 and decompressor 142 on theother hand. In operation, cache memory 40 stores one or more blocks ofdecompressed data, plus information about the address of the cachedblocks. When processor 140 addresses data from cached blocks no accessto bus 12 is needed. When processor 140 addresses data that is not incache memory 40, cache management unit 42 triggers decompressor 142 toretrieve the compressed block from which the addressed data can beretrieved after decompression. Decompressor 142 decompresses theretrieved block and writes the decompressed block to cache memory, sothat it may subsequently be addressed.

If necessary cache management unit 42 creates room in cache memory 40 byreusing cache memory space used for a previous block of uncompresseddata. When processor 140 has updated data in this block, cachemanagement unit first signals compressor 144 to compress theuncompressed block and to write the compressed block to memory 10 (notshown). Various conventional cache write back strategies may be used,such as write through (compressing and writing each time when processor140 updates a data word in cache memory 40), or write back (only whencache space for a new uncompressed block is needed).

It may be noted that upon writing a block of compressed data to memory10, compressor 144 generally needs the entire block of decompresseddata, even if only one word has been updated by processor 140. Hence, inorder to write a data word it may be necessary to retrieve the block ofcompressed data from memory 10, to decompress the block of compresseddata (preferably using decompressor 142), to update the relevant dataword or words in the block of decompressed data, to compress the updatedblock and to write back the compressed block. However, usually a numberof different data words of the uncompressed block is updatedsuccessively. Preferably write back occurs only when processing of theuncompressed block has been completed. Often, moreover, all data in thedecompressed block is updated, so that no decompression of an old blockis needed.

In an embodiment, compression and decompression is optional. In thisembodiment both compressed and decompressed blocks may be stored inmemory 10. Selection whether to compress or not may be performed byprocessor 140, for example by setting a compression control register(not shown) or by selecting compression and no compression when the dataaddress in within and outside a predetermined range of addressesrespectively. In case of uncompressed data compressor 144 and 142 areeffectively bypassed, for example for data addresses outside one or morespecific address ranges. A bit from the data address may used forexample to indicate whether the address in or outside a range wherecompressed or uncompressed data is addressed.

In another embodiment, decompressor 142 is arranged to use one of aseries of different compression options that are each capable ofobtaining decompressed information from the same compressed data, butusing increasingly smaller subsets of the decompressed data. In thememory, for each block of compressed data, data from the smallest subsetis placed first, followed each time by the additional data needed tocomplete the next larger subset. For example, when the block is coded interms of a series of numbers, words containing more significant bits ofthe numbers for the block may be placed first in memory, followed bywords containing less significant bits, these, if applicable beingfollowed by words with even less significant bits and so on. However, itshould be appreciated that other possibilities exist, such as placingnumbers that represent a subsampled subsets of the block first etc. Thedifferent compression options read increasingly larger subsets of theblock of compressed data, with which the decompressor is able toregenerate increasingly higher quality decompressed data. When a certaindecompression option is used, the decompressor terminates memorytransfer when the relevant subset of the data has been transferred. Therequired length of the transfer is computed from the option used and, ifapplicable from a length code for the block (e.g. when more significantbits are used, the number of bits to be transferred follows from thelength (the number of numbers in the block) times the fraction of moresignificant bits that is used). Thus bandwidth use on bus 12 isminimized.

Thus, less bus 12 bandwidth use can be realized by using decompressionof increasingly lower quality. Dependent on the needs of the algorithmexecuted by processor 14, processor 14 selects one of the decompressionalgorithms and commands decompressor 142 to use the selecteddecompression algorithm. Thus, bandwidth use is adapted to the needs ofprocessor 14. Also a bus manager (not shown) may be provided todetermine bus bandwidth use in bus 12 (any known way of determiningbandwidth use may be employed) and to send a signal to select thedecompression algorithm dependent on the available bandwidth on bus 12.

In addition to data cache 40 the processing element may be provided withan instruction cache (not shown) for processor 140. Preferably, theinstruction cache has a separate interface to bus 12. Instructions arepreferably read without decompression, so as to minimize latency andcache managed separate from the decompressed data.

In the preceding it has been described how successive compressed blocksare stored at address distances that correspond to the distance betweenthe starting data addresses of the decompressed blocks that correspondto the compressed blocks. Preferably, the distance corresponds to thedistance between a pair of successive preferred starting addresses asdefined by the memory system architecture for starting a multi-addressmemory transfer via bus 12 in response to a single block address.However, in a further embodiment the distance corresponds to an integermultiple of this distance, i.e. to the distance between a pair ofpreferred starting addresses that are separated by other preferredstarting addresses. If the maximum multi-address transfer length islimited by the distance between successive preferred starting addresses,the entire memory space available for a compressed block in this casecannot be addressed by a single block address 21. This means that inprinciple a plurality of block addresses 21 may need to be supplied toaccess a compressed block. Dependent on the compression ratio one ormore of these block addresses may be omitted when the compressed blockis transferred and/or a final number of data words that is accessiblewith a supplied block addresses may not need to be transferred.

It should be realized in this context that although the words “block ofcompressed data” refer to a collection of data that can be decompressedwithout reference to other blocks, it is not implied that all data fromthe compressed block is needed to decompress any word in the block. Forexample, a block of compressed data may comprise a number of sub-blocksof compressed data that can be decompressed independently. Similarly, ifvariable length coding, such as Huffmann coding, is used it may benecessary to consult data for other words only to determine the startingpoint of the word for a particular address of uncompressed data.

FIG. 5 shows an example of physical memory occupation 50 that makes useof a greater distance between starting addresses of blocks. In thisexample the compression ratio is two. As a result decompressed data 520a,b that would require two block addresses for transfer can be stored ascompressed data in memory spaces 500 a,b (shown as hatched areas) with asize that can be transferred with one block address each. Every othermemory space of this size (shown as not-hatched area) is not occupied bycompressed data and its content need not be transferred. Thus the numberof block addresses that needs to be supplied to memory 10 will behalved. It will be understood that for other factors of compressionother number of memory spaces may be left open.

In principle the memory intermediate spaces left open to facilitateaddressing with addresses in decompressed blocks may be empty ofrelevant data. However, without deviating from the invention other datamay be stored in these intermediate spaces for use by other processes.Also copies of compressed data from other blocks may be stored in theseintermediate spaces. In this case a lookahead can optionally be realizedin some operations by loading data from the entire space betweenpreferred addresses. But, of course this data in the intermediate spacesdoes not continue past the next preferred starting address where a nextblock of compressed data starts.

Furthermore, it should be understood that part of the decompressed datamay be dummy data which is not dependent on the compressed data. As aresult the number datawords that are actually obtained usingdecompression from compressed data that is stored between two blockaddresses may in fact be smaller than the number of datawords betweenthese two block addresses. Moreover, although the blocks of compresseddata (optionally including length information) preferably startimmediately from the preferred starting addresses, it will be understoodthat, without deviating from the invention an offset may be used. Inthis case the preferred starting is still the starting address of themulti-address memory transfer, but some transferred data from the startof the transfer may be left unused for decompression. Similarly, it ispossible to offset the end address of the multi-address transfersomewhat beyond the last address of the compressed block. A bandwidthgain is still realized as longs as the transfer is terminated leavingsome data up to the next preferred starting address untransferred.

Although the invention has been described in terms of processingelements that supply addresses of uncompressed data explicitly andcompressors and decompressors that use the addresses supplied by theprocessing elements to address compressed blocks in memory, it will beappreciated that processing elements may address the data implicitly,for example by signalling “next” to the compressor or decompressor toindicate a change of address to an adjacent address (e.g. a pixel to theright or a later sample of a temporal signal). The invention isadvantageous not only because addresses of uncompressed data can betranslated into memory addresses of blocks of compressed data directly,but also because no data for unneeded blocks needs to be fetched thatwould have to be discarded in case of random access. No administrationneeds to be kept about the starting points of different blocks.

Although the invention is preferably applied to compressed blocks thateach represents data in a same sized sub-range of addresses ofuncompressed data, it will be understood that without deviating from theinvention different sized subranges may be used for different blocks.

1. An apparatus for processing data-items each associated with arespective data address in a range of data addresses, wherein compressedblocks representing the data items are stored in a memory system, memoryaddresses occupied by each block starting from a respective preferredstarting address for multi address transfer of the memory system, eachblock representing compressed data-items associated with data addressesin a respective sub-range of the range, the sub-ranges beingsuccessively contiguous, each particular sub-range having a lengthcorresponding to an address distance between the preferred startingaddress from which addresses of the particular block that represents thedata-items in the particular sub-range start and the preferred startingaddress from which addresses of a next one of the blocks for a nextsuccessive sub-range start, leaving memory addresses not occupied by theparticular block in between blocks, the apparatus comprising the memorysystem, which is capable of performing selectable length multi-addressmemory transfers starting from the preferred starting addresses only, orwith less overhead than starting from other addresses than the preferredstarting addresses; a processing element for processing the data-items;a decompressor coupled between the processing element and the memorysystem, the decompressor being arranged to start a multi address memorytransfer of a required one of the blocks from the memory systemdynamically when the processing element requires access to the block,leaving memory addresses directly following the block up to a preferredstarting address for a next one of the blocks untransferred in thetransfer, and to decompress the data-items from the required one of theblocks before passing the data-items to the processing element.
 2. Anapparatus according to claim 1, wherein the processing element isarranged to indicate, to the decompressor, a decompression optionselected from a series of different decompression options that requiresuccessively less addresses starting from the preferred starting addressof the required one of the blocks to be transferred, the decompressorsetting the length of the memory transfer dependent indicateddecompression option.
 3. An apparatus according to claim 1, wherein thedecompressor is arranged to send a signal to the memory system toterminate the multi-address memory transfer of the required one of theblocks when a number of words, selected dependent on the length of therequired one of the blocks, has been transferred.
 4. An apparatusaccording to claim 3, wherein the decompressor is arranged to retrieveinformation representing the length of the required one of the blocksfrom the multi address memory transfer, the decompressor generating thesignal dependent on said information.
 5. An apparatus according to claim1, wherein the decompressor is arranged to retrieve informationrepresenting the length of the required one of the blocks from a multiaddress memory transfer of a precedingly retrieved block, retrievedpreceding the required one of the blocks and to send a transfer lengthselection signal to the memory system derived from the information atthe start of the multi address memory transfer for the required one ofthe blocks.
 6. An apparatus according to claim 1, wherein the lengths ofthe sub-ranges are mutually equal and larger than a distance betweensuccessive preferred starting addresses, the decompressor being arrangedto start subsequent multi-address memory transfers for the required oneof the blocks conditionally dependent on the length of the block.
 7. Anapparatus according to claim 6, wherein each block comprises a pluralityof sub-blocks that are decompressible independently of one another, eachsub-block corresponding to a respective equal sized part of thesub-range for the block, the decompressor comprising a buffer memoryregion, for buffering the sub-blocks of compressed data read during themulti-address memory transfer, an intermediate memory region for storingdata decompressed from the sub-blocks successively, the decompressorreplacing the decompressed data from respective sub-blocks read duringthe memory transfer with one another successively in the intermediatememory.
 8. An apparatus according to claim 1, wherein the decompressoris arranged to apply decompression corresponding to lossy blockcompression.
 9. An apparatus according to claim 1, wherein thedecompressor is arranged to apply decompression corresponding tovariable length block compression.
 10. An apparatus according to claim1, wherein the sub-ranges have mutually equal lengths.
 11. An apparatusaccording to claim 1, comprising a compressor for compressing the dataitems associated with respective ones of the sub-ranges that has alength equal to the distance between a pair of preferred startingaddresses, the compressor compressing the data items associated with arespective one of the sub-ranges each into a respective one of theblocks, the compressor being arranged to store the compressed blocksinto the memory system using a respective multi-address memory transferfor each respective one of the blocks, each transfer starting from arespective one of the preferred starting addresses, the decompressorterminating the multi-address memory transfers upon completion ofstoring each block, without writing up to a next preferred startingaddress when not required for the block.
 12. An apparatus according toclaim 11, wherein the processing element computes the data-items forcompression and the compressor is arranged to receive the data items forcompression from the processing element.
 13. An apparatus according toclaim 11, wherein the compressor is arranged to adapt a compressionratio for compression of the data dependent on a dynamically measuredlevel of available bandwidth for access to the memory system.
 14. Amethod of processing a set of data-items, in which each data-item isassociated with a respective data address in a range of data addresses,the method comprising providing a memory system that has memoryaddresses comprising a subset of equidistant preferred startingaddresses from which multi-address memory transfers can be startedexclusively, or with less overhead than from other addresses than thepreferred starting addresses; storing compressed blocks in the memorysystem, addresses used for each respective one of the blocks startingfrom a respective one of the preferred starting addresses, each blockrepresenting compressed data-items associated with data addresses in arespective sub-range of the range, the sub-ranges being successivelycontiguous, each particular sub-range having a length corresponding toan address distance between the preferred starting address from whichthe particular block that represents the data-items in the particularsub-range starts and the preferred starting address from which a nextone the blocks for a next successive sub-range starts, leaving memoryaddresses not occupied by the particular block in between.
 15. A methodaccording to claim 14, comprising processing decompressed data-itemsderived from the blocks; retrieving a required one of the blocks fromthe memory system for said processing, by means of a multi-addressmemory transfer starting from the preferred starting address startingfrom which the required one of the blocks is stored; terminating themulti-address memory transfer for the required one of the blocksaccording to a length of the required one of the blocks, leaving contentof memory addresses directly following addresses used for the requiredone of the blocks untransferred.
 16. A method according to claim 14,comprising storing information representing a length of the required oneof the blocks with the required one of the blocks in the memory systemfor transfer in the multi-address memory transfer.
 17. A methodaccording to claim 14, comprising storing information representing thelength of the required one of the blocks with a logically preceding oneof the blocks from which data-items are normally processed during saidprocessing preceding data-items from the required one of the blocks, fortransfer in a multi-address memory transfer for the logically precedingone of the blocks.
 18. A method according to claim 17, comprisingreading the information from the logically preceding one of the blocks;sending a transfer length selection signal selected dependent on theinformation to the memory system at the start of the multi addressmemory transfer for the required one of the blocks.
 19. A methodaccording to claim 14, wherein lossy block compression of uncompresseddata is used to generate the blocks.
 20. A method according to claim 14,wherein variable length block compression of uncompressed data is usedto generate the blocks.
 21. A method according to claim 20, wherein acompression ratio of the variable length block compression isdynamically adjusted according to dynamically available bandwidth foraccess to the memory system.
 22. A computer program product comprisingmachine instructions for controlling memory transfers and decompressionaccording to the method of claim 14.